智能论文笔记

Shopping Queries Dataset: A Large-Scale ESCI Benchmark for Improving Product Search

Chandan K. Reddy , Lluís Màrquez , Fran Valero , Nikhil Rao , Hugo Zaragoza , Sambaran Bandyopadhyay , Arnab Biswas , Anlu Xing , Karthik Subbian

分类：机器学习

2022-06-14

提高搜索结果的质量可以显着增强用户的体验和与搜索引擎的交战。尽管机器学习和数据挖掘领域的最新进展，但正确对特定用户搜索查询的项目进行了分类一直是一个长期的挑战，这仍然有很大的改进空间。本文介绍了“购物查询数据集”，这是一个很大的亚马逊搜索查询和结果的大型数据集，以促进研究以提高搜索结果的质量，以促进研究。该数据集包含大约1.3万个独特的查询和260万手动标记（查询，产品）相关性判断。该数据集具有多语言，其中包括英语，日语和西班牙语的查询。购物查询数据集用于KDDCUP'22挑战之一。在本文中，我们描述了数据集并介绍了三个评估任务以及基线结果：（i）对结果列表进行排名，（ii）将产品结果分类为相关性类别，以及（iii）确定给定查询的替代产品。我们预计这些数据将成为产品搜索主题的未来研究的黄金标准。

translated by 谷歌翻译

API-Spector: an API-to-API Specification Recommendation Engine

Sae Young Moon , Fran Silavong , Sean Moran

分类：人工智能

2022-12-14

When designing a new API for a large project, developers need to make smart design choices so that their code base can grow sustainably. To ensure that new API components are well designed, developers can learn from existing API components. However, the lack of standardized method for comparing API designs makes this learning process time-consuming and difficult. To address this gap we developed the API-Spector, to the best of our knowledge one of the first API-to-API specification recommendation engines. API-Spector retrieves relevant specification components written in OpenAPI (a widely adopted language used to describe web APIs). API-Spector presents several significant contributions, including: (1) novel methods of processing and extracting key information from OpenAPI specifications, (2) innovative feature extraction techniques that are optimized for the highly technical API specification domain, and (3) a novel log-linear probabilistic model that combines multiple signals to retrieve relevant and high quality OpenAPI specification components given a query specification. We evaluate API-Spector in both quantitative and qualitative tasks and achieve an overall of 91.7% recall@1 and 56.2% F1, which surpasses baseline performance by 15.4% in recall@1 and 3.2% in F1. Overall, API-Spector will allow developers to retrieve relevant OpenAPI specification components from a public or internal database in the early stages of the API development cycle, so that they can learn from existing established examples and potentially identify redundancies in their work. It provides the guidance developers need to accelerate development process and contribute thoughtfully designed APIs that promote code maintainability and quality.

translated by 谷歌翻译

ALANNO: An Active Learning Annotation System for Mortals

Josip Jukić , Fran Jelenić , Miroslav Bićanić , Jan Šnajder

分类：机器学习

2022-11-11

In today's data-driven society, supervised machine learning is rapidly evolving, and the need for labeled data is increasing. However, the process of acquiring labels is often expensive and tedious. For this reason, we developed ALANNO, an open-source annotation system for NLP tasks powered by active learning. We focus on the practical challenges in deploying active learning systems and try to find solutions to make active learning effective in real-world applications. We support the system with a wealth of active learning methods and underlying machine learning models. In addition, we leave open the possibility to add new methods, which makes the platform useful for both high-quality data annotation and research purposes.

translated by 谷歌翻译

Topical: Learning Repository Embeddings from Source Code using Attention

Agathe Lherondelle , Yash Satsangi , Fran Silavong , Shaltiel Eloul , Sean Moran

分类：人工智能

2022-08-19

源代码（MLONCODE）上的机器学习有望改变软件的交付方式。通过挖掘软件伪像之间的上下文和关系，mloncode通过代码自动生成，代码建议，代码自动标记和其他数据驱动的增强功能增强了软件开发人员的功能。对于许多任务中，代码的脚本级别表示足够，但是，在许多情况下，要考虑各种依赖关系和存储库结构的存储库级表示，例如，自动标记存储库具有主题或自动记录的存储库。代码等，用于计算存储库级表示的现有方法受（a）依赖代码的自然语言文档（例如，读书文件）（b）方法/脚本级表示的天真聚集，例如，通过串联或平均值。本文介绍了一个深度神经网络，该网络可直接从源代码中生成可公开可用的GitHub代码存储库的存储库嵌入。主题结合了一种注意机制，该机制将源代码，完整依赖关系图和脚本级别的文本信息投射到密集的存储库级表示中。为了计算存储库级别的表示，局部训练可以预测与存储库相关的主题，该主题是在公开可用的GitHub存储库数据集中，这些存储库与他们的地面真相主题标签一起爬行。我们的实验表明，局部计算的嵌入能够胜过多个基线，包括通过在存储库自动标记的任务下平均或串联来天真地结合方法级表示的基线。

translated by 谷歌翻译

Online 3D Bin Packing Reinforcement Learning Solution with Buffer

Aaron Valero Puche , Sukhan Lee

分类：机器人 | 人工智能

2022-08-15

3D垃圾箱包装问题（3D-BPP）是行业中需求最高但最具挑战性的问题之一，在该问题中，代理必须将序列交付的可变尺寸项目填充到有限的箱中，以最大程度地利用空间利用率。它代表了一个强烈的NP-硬化优化问题，因此迄今为止没有提供空间利用率高性能的解决方案。在本文中，我们提出了一个新的强化学习（RL）框架，用于改善性能的3D-BPP解决方案。首先，引入缓冲区以允许多项目操作选择。通过提高行动选择的自由度，可以得出一项更复杂的政策，从而导致更好的包装绩效。其次，我们提出了一种不可知的数据增强策略，该策略利用了两个bin项目对称性以提高样品效率。第三，我们实施了一种基于模型的RL方法，该方法改编自流行的算法Alphago，该算法在零和游戏中显示了超人性能。我们的适应能够在单人游戏和基于分数的环境中工作。尽管已知Alphago版本在计算上很重，但我们还是设法用单个线程和GPU训练所提出的框架，同时获得了胜过最先进的解决方案，从而导致空间利用率。

translated by 谷歌翻译

A Novel IoT-based Framework for Non-Invasive Human Hygiene Monitoring using Machine Learning Techniques

Md Jobair Hossain Faruk , Shashank Trivedi , Mohammad Masum , Maria Valero , Hossain Shahriar , Sheikh Iqbal Ahamed

分类：机器学习

2022-07-07

人们的个人卫生习惯在每日生活方式中照顾身体和健康的状况。保持良好的卫生习惯不仅减少了患疾病的机会，而且还可以降低社区中传播疾病的风险。鉴于目前的大流行，每天的习惯，例如洗手或定期淋浴，在人们中至关重要，尤其是对于单独生活在家里或辅助生活设施中的老年人。本文提出了一个新颖的非侵入性框架，用于使用我们采用机器学习技术的振动传感器监测人卫生。该方法基于地球通传感器，数字化器和实用外壳中具有成本效益的计算机板的组合。监测日常卫生常规可能有助于医疗保健专业人员积极主动，而不是反应性，以识别和控制社区内潜在暴发的传播。实验结果表明，将支持向量机（SVM）用于二元分类，在不同卫生习惯的分类中表现出约95％的有希望的准确性。此外，基于树的分类器（随机福雷斯特和决策树）通过实现最高精度（100％）优于其他模型，这意味着可以使用振动和非侵入性传感器对卫生事件进行分类，以监测卫生活动。

translated by 谷歌翻译

Malware Detection and Prevention using Artificial Intelligence Techniques

Md Jobair Hossain Faruk , Hossain Shahriar , Maria Valero , Farhat Lamia Barsha , Shahriar Sobhan , Md Abdullah Khan , Michael Whitman , Alfredo Cuzzocreak , Dan Lo , Akond Rahman

分类：人工智能 | 机器学习

2022-06-26

随着技术的快速进步，由于恶意软件活动的增加，安全性已成为一个主要问题，这对计算机系统和利益相关者的安全性和安全性构成了严重威胁。为了维持利益相关者，特别是最终用户的安全，保护数据免受欺诈性努力是最紧迫的问题之一。旨在破坏预期的计算机系统和程序或移动和Web应用程序的一组恶意编程代码，脚本，活动内容或侵入性软件称为恶意软件。根据一项研究，幼稚的用户无法区分恶意和良性应用程序。因此，应设计计算机系统和移动应用程序，以检测恶意活动以保护利益相关者。通过利用包括人工智能，机器学习和深度学习在内的新颖概念，可以使用许多算法来检测恶意软件活动。在这项研究中，我们强调了基于人工智能（AI）的技术来检测和防止恶意软件活动。我们详细介绍了当前的恶意软件检测技术，其缺点以及提高效率的方法。我们的研究表明，采用未来派的方法来开发恶意软件检测应用程序应具有很大的优势。对该综合的理解应帮助研究人员使用AI进行进一步研究恶意软件检测和预防。

translated by 谷歌翻译

Orthonormal Convolutions for the Rotation Based Iterative Gaussianization

Valero Laparra , Alexander Hepburn , J. Emmanuel Johnson , Jesús Malo

分类：计算机视觉

2022-06-08

在本文中，我们详细阐述了基于旋转的迭代高斯rbig的扩展，这使图像高斯化成为可能。尽管RBIG已成功应用于许多任务，但它仅限于中等维度数据（按千维数据）。在图像中，其应用程序仅限于小图像贴片或孤立的像素，因为RBIG中的旋转基于主或独立的组件分析，并且这些转换很难学习和扩展。在这里，我们提出\ emph {卷积rbig}：通过强加rbig中的旋转是卷积来减轻此问题的扩展。我们建议通过优化使用转置卷积操作的输入和转换转换的近似反向来学习卷积旋转（即正交卷积）。此外，我们建议在学习这些正规卷积方面不同。例如，激活中施加稀疏性会导致一种转换，该转换将卷积独立的组件分析扩展到多层体系结构。我们还强调了如何从\ emph {卷积rbig}获得数据的统计属性（例如多元互信息）。我们通过简单的纹理合成示例来说明转换的行为，并通过可视化刺激来分析其属性，从而最大程度地提高某些特征和层中的响应。

translated by 谷歌翻译

Neural Networks with Divisive normalization for image segmentation with application in cityscapes dataset

Pablo Hernández-Cámara , Valero Laparra , Jesús Malo

分类：计算机视觉 | 机器学习

2022-03-25

One of the key problems in computer vision is adaptation: models are too rigid to follow the variability of the inputs. The canonical computation that explains adaptation in sensory neuroscience is divisive normalization, and it has appealing effects on image manifolds. In this work we show that including divisive normalization in current deep networks makes them more invariant to non-informative changes in the images. In particular, we focus on U-Net architectures for image segmentation. Experiments show that the inclusion of divisive normalization in the U-Net architecture leads to better segmentation results with respect to conventional U-Net. The gain increases steadily when dealing with images acquired in bad weather conditions. In addition to the results on the Cityscapes and Foggy Cityscapes datasets, we explain these advantages through visualization of the responses: the equalization induced by the divisive normalization leads to more invariant features to local changes in contrast and illumination.

translated by 谷歌翻译

Topic Modeling on Podcast Short-Text Metadata

Francisco B. Valero , Marion Baranes , Elena V. Epure

分类：自然语言处理

2022-01-12

播客已经出现在大量消耗的在线内容中，特别是由于生产手段的可访问性和通过大型流平台进行缩放分布。分类系统和信息访问技术通常使用主题作为组织或导航播客集合的主要方式。然而，用主题注释播客仍然是非常有问题的，因为分配的编辑类型是广泛的，异构或误导性的，或者因为数据挑战（例如，MetaData文本短，嘈杂的成绩单）。在这里，我们使用主题建模技术来评估从播客元数据，标题和描述中发现相关主题的可行性。我们还提出了一种新的策略来利用命名实体（NES），通常存在于播客元数据中，以非负矩阵分解（NMF）主题建模框架。我们在Spotify和iTunes和Deezer中的两个现有数据集的实验，该数据来自提供播客目录的新数据集，显示我们所提出的文档表示Neice，导致基于基线的主题连贯性。我们释放了结果的实验性再现性的代码。

translated by 谷歌翻译